General notes on project: - max of 15 pages

Executive Summary {#executive-summary} - Tonah

Many countries around the world provide nationwide dietary guidelines to the public and require nutrition labels on food products to help people make informed choices about foods and drinks they consume. Every individual, however, has different nutrition needs and preferences according to their age, sex, ethnicity, height, weight, and physical activity level, among many other factors. Therefore, people may benefit from more personalized dietary recommendations. Our goal is to provide the basis for an algorithm that recommends foods a person should consume based on their nutrition needs. To this end, we will compare different approaches to grouping foods based on their nutrient profiles (i.e., spectrum clustering vs. k-means clustering) and demonstrate how others can use our data-driven food groupings to meet their dietary needs.

Introduction {#introduction} - Tonah

  • make sure to include citations - showing why this research matters (can you send me PDFs of the citations you include so I can include them in the final project doc)
    • create citations in the same way we did before in diabetes project

Methods

Data Summary /EDA {#eda} - Keana

Our dataset originates from the most recent (updated in 2015) version of the Canadian Nutrient File. The database contains average values for nutrients in foods available in Canada. These averages are based on the generic versions of a food, unless there is a brand specifically included in the database. This is a bilingual dataset with food names, descriptions, and background information that are in both French and English. One of the major goals in creating this version of the dataset was to update nutrient values for foods that are the largest contributors of sodium to the diet, since one of the major goals of manufacturers is to reduce sodium content of foods.

To this end, the database assesses more than 5690 unique foods, ranging from foods such as Cheese souffle to Vanilla extract, and provides the average nutrient levels per 100 grams.

Notably, some of the nutrients included are subcomponents of other nutrients. Or, like the two metrics for food energy (i.e., kcal and kJ), they are just different ways of measuring the same thing (i.e., one kcal = 4.184 kJ)

Also, it is worth noting that there are many values that are missing. For instance, only 0.982% of the rows have values for biotin (see Missing values in clean dataset section)

The original dataset was a relational dataset, with unique identifiers for the main variables of interest. Therefore, we had to merge on the unique identifiers (e.g., FoodID) across the relational datasets using the left_join function. Then, we

The original dataset was cleaned for analyses in the following ways:

(see here for a full summary of the variables in the cleaned dataset). All of the subsequent summary statistics and analyses mentioned in the paper will be using the clean dataset.

notes from here: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/file:///C:/Users/keana/OneDrive%20-%20PennO365/Comp_transfer2018/Penn/fourth_yr/nutrients_project_stat571/cnf-fcen-csv/CNF%202015%20users_guide%20EN.pdf

description of file contents: From FOOD NAME.csv: FoodID (merging var) FoodGroupID (merging var) FoodDescription From NUTRIENT NAME.csv: NutrientID (merging var) NutrientName NutrientUnit (possibly as control??) From NUTRIENT AMOUNT.csv: NutrientValue FoodID (merging var) NutrientID (merging var) From FOOD GROUP.csv FoodGroupID (merging var) FoodGroupName

Main file of interest is NUTRIENT AMOUNT.csv – contains long version of dataset, where there are multiple rows for each food, along with a col for the nutrient identifier and the nutrient value associated with that identifier

Will have to combine the multiple datasets in one to have all info in one place

“At present foods are grouped under 23 different group headings based on similar characteristics of the foods”

The foodnamesare only available in this version in one lengthwhich does not include abbreviationsand can be up to 255 characters long

  • All of the nutrient data is stored per 100g of the food (edible portion) - that is, all nutritional data is on the same scale

  • in cleaning - only selected variables of interest (e.g., we removed mean SE for the nutrient values, since there were many missing values)

Analyses {#analyses} - Jeesung (both describing the analyses that were run - aka methods and running the analyses)

As mentioned before, there are several variables with multiple NA values. Therefore, to be able to run our target analyses, we needed to remove the NA values. We explored two different options for removing NA variables to see whether they would affect the results: mean imputation for all variables or first removing variables with more than 50% of NAs and then using mean imputation on the survived variables.

We found that the options produced similar results across our different options for mean imputation, so we used the version of the dataset where we used mean imputation for all variables in all subsequent analyses.

Since our goal is to identify the best algorithm for grouping foods that will help a person plan out their meals to best suit their needs, we compared different approaches to grouping the foods based on their nutrient profiles. That is, we wanted to maximize within-group similarity, while also minimizing between-group similarity, in nutrient profiles across different food groups. The first option we explored to achieve this goal was by applying k-means clustering to both versions of the dataset using the silhouette method to identify the optimal number of clusters. (briefly EXPLAIN LOGIC BEHIND THIS METHOD).

We will run spectrum clustering to see which foods are grouped together based on their nutrient profile. In doing so, we want to find out whether we can find clear-cut clusters of foods based on nutrition profile regardless of actual food group assignment. Our grouping results can be used to find the list of foods that contain the best combination of nutrients based on a persons dietary needs. This improves upon diets that try to rely on the actual food grouping, since our grouping result will be a better representation of the nutrient profile of a set of foods than the food group label, which may be determined arbitrarily. Since we have large dimensions of data (153 nutrient information for 5690 unique foods), we are going to run principal component analysis (PCA) to

Then, ran analysis on theoretically-driven set of nutrients (based on WHO).

1. Data Preparation

2. Data Analyses

2.1. Regular K-means Clustering

1) Run k-means clustering

## [1] "Size of each cluster is" "9"                      
## [3] "21"                      "9"                      
## [5] "23"                      "97"                     
## [7] "1785"                    "3687"                   
## [9] "59"

2.1) Plotting two random variables (Carb and Energy (Kcal))

#### 2.2) Plotting Wordclouds

#### 3) Mean calories per cluster

4) Most important nutrients consisting each cluster based on absolute values of PC loadings

2.2. Spectrum Clustering

1) Comparing UNSCALED vs. SCALED data

1.1) Run PCA: UNSCALED and centered

#### 1.2) Run PCA: SCALED and centered #### Conclusion: scale or unscale?

Scaling data is more reasonable since all variables are measured in different units. (We might wanna just remove comparing PVE parts above and just explain that it conceptually makes more sense to center & scale data here)

2) Most important nutrients used for spectrum clustering (PC1, PC2)

3) Run k-means clustering using 10 PCs

#### 4.1) Plotting PC scores of Foods (note: all exploratory plots)

4.2) Plotting Wordclouds

5) Mean calories per cluster

6) Most important nutrients consisting each cluster based on absolute values of PC loadings

#### ** Which results to report? : compare Regular vs. Spectrum Clustering

Since the goal of clustering is to minimize the total within-cluster sum of squares, we identify which of the results has the smallest within-cluster sum of squares to determine which option might be better. As seen below, within-cluster SS is smaller for the clustering result after applying K means to 10 PCs compared to the original data across the board.

## [1] 67.8
## [1] 2.70e+08 2.39e+09 1.33e+09 1.57e+09 3.98e+09 3.22e+09 4.11e+09 5.92e+08
## [1] 51
## [1]  8357 16634 11440 62118  9362 14693 31793 21739

3. Making Nutrient-Driven Personalized Dietary Recommendations (Based on Spectrum Clusters of Foods)

Scenario 1. Build muscles

Mark wants to build muscles and burn fats. His current weight: 180lbs * recommended amount of protein intake: 1.5g/pound –> for Mark, 270 grams/day recommended * recommended amount of carbohydrates intake: 2~3g/pound –> 360~540 grams/day * recommended amount of fat intkae: saturated fat, 20~30% of daily calories * recommended calories count: 20 kcal/pound –> 3600 kcal * other nutrients that help building muscles: calcium, biotin, iron, vitamin C, selenium, Omega 3, vitaim D, vitamin B12, copper, magnesium, riboflavin, zinc

1) Create Data

2) Find the cluster for scenario 1

3) 10 food Recommendations from Cluster 5

##  [1] Babyfood, snack, biscuit, mixed grain                     
##  [2] Egg, chicken, whole, fresh or frozen, raw                 
##  [3] Cream, table (coffee), 18% M.F.                           
##  [4] Spices, sage, ground                                      
##  [5] Guinea, meat and skin, raw                                
##  [6] Vegetable oil, palm                                       
##  [7] Fish oil, sardine                                         
##  [8] Cheese, Mexican, queso chihuahua                          
##  [9] Duck, domesticated, meat and skin, raw                    
## [10] Chicken, broiler, leg, meat and skin, batter dipped, fried
## 5689 Levels: Abiyuch, raw ... Zwieback

Scenario 2. Going on a keto diet & lose weight, particularly fat

Mark wants to try out Ketogenid diet to lose weight. Popular ketogenic resources suggest an average of 70-80% fat from total daily calories, 5-10% carbohydrate, and 10-20% protein

1) Create Data

2) Find the cluster for scenario 2

3) 10 food Recommendations from Cluster 5

##  [1] Turkey, all classes, meat and skin, raw              
##  [2] Chicken, broiler, separable fat, raw                 
##  [3] Pear, dried halves, sulphured, cooked, no added sugar
##  [4] Shortening, industrial, for baking (cake), soybean   
##  [5] Pork, loin, tenderloin, lean and fat, broiled        
##  [6] Brussels sprouts, frozen, boiled, drained            
##  [7] Soup, chicken noodle, canned, condensed, water added 
##  [8] Turkey, tom, meat and skin, raw                      
##  [9] Cress, garden, boiled, drained                       
## [10] Egg substitute, frozen (yolk replaced)               
## 5689 Levels: Abiyuch, raw ... Zwieback

Scenario 3. Dibetes Diet

Mark, diabete patient: male, weigh 70kg. want to maintain weight. gonna consume 1500kcal per day

-source: https://www.ncbi.nlm.nih.gov/books/NBK279012/#:~:text=As%20with%20the%20general%20population,of%20the%20daily%20carbohydrate%20intake, https://www.niddk.nih.gov/health-information/diabetes/overview/diet-eating-physical-activity

  • 0.8 g protein/kg desirable body weight
  • 14g of fiber per 1000 kcal ingested
  • limit: saturated fat, trans fat, sugar
  • limit sodium consumption to 2,300 mg/day.
  • increase: monounsaturated fats, polyunsatruated fats, STARCH carb,

1) Create Data

2) Find the cluster for scenario 3

3) 10 food Recommendations from Cluster 3

##  [1] Cheese, Mexican, queso asadero                  
##  [2] Spices, cloves, ground                          
##  [3] Cheese, cheddar or colby type, low fat (7% M.F.)
##  [4] Vanilla extract                                 
##  [5] Spices, turmeric, ground                        
##  [6] Spices, paprika                                 
##  [7] Cheese, parmesan, shredded                      
##  [8] Cheese, mozzarella, (48% water, 25% M.F.)       
##  [9] Cheese fondue                                   
## [10] Cream, table (coffee), 15% M.F.                 
## 5689 Levels: Abiyuch, raw ... Zwieback

4. Theoretically Driven Nutrients & Spectrum Clustering

  • Choose the most crucial nutrients (https://www.medicalnewstoday.com/articles/326132) and run clustering based on them
  • Vitamin, minerals (magnesium,calcium, phosphorus,sulfur,socium,potassium,chloride,iron,selenum,zinc,magnaese,chromium,copper,iodine,fluoride,molybdenum), water, portein, carbohydrates, fats, energy(kcal)

  • will adopt spectrum clustering and scale the data when running PCA

####1) Run PCA

2) Most important nutrients used for spectrum clustering (PC1, PC2)

3) Run k-means clustering

4.1) Plotting : two PCs (Note: all exploratory plots)

4.2) Plotting : wordcloud

#### 5) Mean calories per cluster

6) Most important nutrients (PCA)

EXPLORATORY ANALYSES: REMOVE OVERLAPPING VARIABLES (CAN GO TO APPENDIX)

  • from the main analyses above, we found “FAT” information is overexaggerated when scaling is applied
  • only include “TOTAL FAT” and removed all subtypes of FAT

1. Data Preparation

2. Data Analyses

2.1. Regular K-means Clustering

1) Run k-means clustering

## [1] "Size of each cluster is" "110"                    
## [3] "19"                      "67"                     
## [5] "30"                      "5464"

2.1) Plotting two random variables (Carb and Energy (Kcal))

#### 2.2) Plotting Wordclouds

#### 3) Mean calories per cluster

4) Most important nutrients consisting each cluster based on absolute values of PC loadings

2.2. Spectrum Clustering

1) Comparing UNSCALED vs. SCALED data

1.1) Run PCA: UNSCALED and centered

#### 1.2) Run PCA: SCALED and centered #### ** Conclusion: scale or unscale?

Scaling data is more reasonable since all variables are measured in different units. (We might wanna just remove comparing PVE parts above and just explain that it conceptually makes more sense to center & scale data here)

2) Most important nutrients used for spectrum clustering (PC1, PC2)

#### 3) Run k-means clustering using 10 PCs

4.1) Plotting PC scores of Foods (note: all exploratory plots)

4.2) Plotting Wordclouds

5) Mean calories per cluster

6) Most important nutrients consisting each cluster based on absolute values of PC loadings

#### ** Which results to report? : compare Regular vs. Spectrum Clustering

Results - Keana

Will only describe one mean imputation result (probably the one that best maximizes within similarity and minimizes between similarity across the different approaches.

First approach: K means clustering on all available nutrients Describe: - characteristics of cluster - size of cluster, within & between group SS - plot of the “clustering over two randomly chosen variables” - words clouds - interesting that cluster 1 and cluster 8 both have dehydrated as the common word - mean kcal per cluster - cluster 6 (seems to be associated with high-sugar foods) – has highest kcal level - most important nutrients based on PCA

Second approach: spectrum clustering on all available nutrients Describe: - results from PCA (scree plot), elbow rule - characteristics of cluster - size of cluster, within & between group SS - plot of the “8 food clusters bsaed on 7 PCs” - words clouds - results are bsaically the samee here
- mean kcal per cluster - most important nutrients based on PCA

Third approach: spectrum clustering on theoretically-driven nutrients Describe: - plot of the “clustering over two randomly chosen variables” - words clouds - mean kcal per cluster - most important nutrients based on PCA

Once we’ve decided on the best method - Make recommendation for a person who has certain dietary needs

Results - Keana

Conclusion {#conclusion} - Keana

chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/file:///C:/Users/keana/OneDrive%20-%20PennO365/Comp_transfer2018/Penn/fourth_yr/nutrients_project_stat571/cnf-fcen-csv/CNF%202015%20users_guide%20EN.pdf - The CNF is particularly suited for assessment of diets, recipe development, menu planning when ingredients or menu items are not specific and for population nutrition surveillance activities, where nutrient intake distributions areused to conduct risk assessments such as modeling for fortification proposals. It is also useful in the initial stages of product development to ensure that nutritional targets can be met.

Limitations {#limitations} - Keana

chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/file:///C:/Users/keana/OneDrive%20-%20PennO365/Comp_transfer2018/Penn/fourth_yr/nutrients_project_stat571/cnf-fcen-csv/CNF%202015%20users_guide%20EN.pdf - The exact nutrient composition of a specific apple or cookie isnot found on the CNF. These averages, except where indicated otherwise, take into account sources of a given food across Canada. Local foods may have a different profile than the national average. - Most users are lookingfor an average or mean value for a generic representation of the foods as described. These generic values have been derived from combining brands of similar products, for example all major brands of ketchup; various varieties of oranges or similar beef cuts from various producers.
- This dataset is only relevant to products available in Canada - so the results cannot be generalized to products from other countries. Therefore future research should explore whether these findings replicate among products in other countries. - the nutrient values are all standardized, not representative of how much a person may actually consume in a package - would need to convert to nutrient values for the actual portions people eat to be interpretable

Appendix {#appendix} - Keana

Full summary of clean dataset

Data Frame Summary

wide_data

Dimensions: 5690 x 154
Duplicates: 0

No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 FoodGroupName
[factor]
1. Babyfoods
2. Baked Products
3. Beef Products
4. Beverages
5. Breakfast cereals
6. Cereals, Grains and Pasta
7. Dairy and Egg Products
8. Fast Foods
9. Fats and Oils
10. Finfish and Shellfish Pro
[ 13 others ]
94 ( 1.7%)
441 ( 7.8%)
170 ( 3.0%)
243 ( 4.3%)
212 ( 3.7%)
155 ( 2.7%)
241 ( 4.2%)
174 ( 3.1%)
144 ( 2.5%)
325 ( 5.7%)
3491 (61.4%)
0
(0.0%)
2 PROTEIN
[numeric]
Mean (sd) : 11.1 (10.8)
min < med < max:
0 < 7.6 < 85.6
IQR (CV) : 16.6 (1)
2261 distinct values 0
(0.0%)
3 FAT (TOTAL LIPIDS)
[numeric]
Mean (sd) : 10 (16.7)
min < med < max:
0 < 3.8 < 100
IQR (CV) : 11.6 (1.7)
1913 distinct values 0
(0.0%)
4 CARBOHYDRATE, TOTAL (BY DIFFERENCE)
[numeric]
Mean (sd) : 22 (26.5)
min < med < max:
0 < 10.3 < 100
IQR (CV) : 31.6 (1.2)
2756 distinct values 0
(0.0%)
5 ASH, TOTAL
[numeric]
Mean (sd) : 1.9 (3.5)
min < med < max:
0 < 1.2 < 99.8
IQR (CV) : 1.2 (1.8)
646 distinct values 1
(0.0%)
6 ENERGY (KILOCALORIES)
[numeric]
Mean (sd) : 219 (174)
min < med < max:
0 < 174 < 902
IQR (CV) : 240 (0.8)
665 distinct values 0
(0.0%)
7 ALCOHOL
[numeric]
Mean (sd) : 0.1 (1.8)
min < med < max:
0 < 0 < 42.5
IQR (CV) : 0 (14.7)
37 distinct values 325
(5.7%)
8 MOISTURE
[numeric]
Mean (sd) : 55 (31)
min < med < max:
0 < 64.7 < 100
IQR (CV) : 50.3 (0.6)
3417 distinct values 0
(0.0%)
9 CAFFEINE
[numeric]
Mean (sd) : 3.9 (101)
min < med < max:
0 < 0 < 5714
IQR (CV) : 0 (25.6)
61 distinct values 312
(5.5%)
10 THEOBROMINE
[numeric]
Mean (sd) : 7 (77.1)
min < med < max:
0 < 0 < 2634
IQR (CV) : 0 (11.1)
130 distinct values 338
(5.9%)
11 ENERGY (KILOJOULES)
[numeric]
Mean (sd) : 915 (727)
min < med < max:
0 < 727 < 3774
IQR (CV) : 1006 (0.8)
1658 distinct values 1
(0.0%)
12 SUGARS, TOTAL
[numeric]
Mean (sd) : 7.7 (15)
min < med < max:
0 < 1.3 < 99.8
IQR (CV) : 7.6 (1.9)
1505 distinct values 1046
(18.4%)
13 FIBRE, TOTAL DIETARY
[numeric]
Mean (sd) : 2.4 (4.8)
min < med < max:
0 < 0.8 < 79
IQR (CV) : 2.8 (2)
237 distinct values 224
(3.9%)
14 CALCIUM
[numeric]
Mean (sd) : 76.9 (220)
min < med < max:
0 < 24 < 7364
IQR (CV) : 60 (2.9)
468 distinct values 51
(0.9%)
15 IRON
[numeric]
Mean (sd) : 2.6 (5.6)
min < med < max:
0 < 1.1 < 124
IQR (CV) : 2.1 (2.2)
882 distinct values 51
(0.9%)
16 MAGNESIUM
[numeric]
Mean (sd) : 39.7 (64.8)
min < med < max:
0 < 21 < 781
IQR (CV) : 23 (1.6)
302 distinct values 214
(3.8%)
17 PHOSPHORUS
[numeric]
Mean (sd) : 168 (236)
min < med < max:
0 < 130 < 9918
IQR (CV) : 176 (1.4)
620 distinct values 153
(2.7%)
18 POTASSIUM
[numeric]
Mean (sd) : 308 (447)
min < med < max:
0 < 232 < 16500
IQR (CV) : 215 (1.5)
895 distinct values 165
(2.9%)
19 SODIUM
[numeric]
Mean (sd) : 333 (1219)
min < med < max:
0 < 82 < 38758
IQR (CV) : 339 (3.7)
1099 distinct values 43
(0.8%)
20 ZINC
[numeric]
Mean (sd) : 1.6 (3)
min < med < max:
0 < 0.8 < 91
IQR (CV) : 1.8 (1.9)
695 distinct values 220
(3.9%)
21 COPPER
[numeric]
Mean (sd) : 0.2 (0.6)
min < med < max:
0 < 0.1 < 15.1
IQR (CV) : 0.2 (2.8)
787 distinct values 270
(4.7%)
22 MANGANESE
[numeric]
Mean (sd) : 0.6 (3.7)
min < med < max:
0 < 0.1 < 133
IQR (CV) : 0.4 (6.1)
1226 distinct values 585
(10.3%)
23 SELENIUM
[numeric]
Mean (sd) : 14.6 (36.7)
min < med < max:
0 < 6.9 < 1917
IQR (CV) : 19.4 (2.5)
614 distinct values 722
(12.7%)
24 RETINOL
[numeric]
Mean (sd) : 88.8 (840)
min < med < max:
0 < 0 < 30000
IQR (CV) : 11 (9.5)
326 distinct values 499
(8.8%)
25 BETA CAROTENE
[numeric]
Mean (sd) : 292 (1711)
min < med < max:
0 < 0 < 42891
IQR (CV) : 33 (5.9)
612 distinct values 653
(11.5%)
26 ALPHA-TOCOPHEROL
[numeric]
Mean (sd) : 1.2 (4.1)
min < med < max:
0 < 0.3 < 149
IQR (CV) : 0.6 (3.5)
447 distinct values 1555
(27.3%)
27 VITAMIN D (INTERNATIONAL UNITS)
[numeric]
Mean (sd) : 23.9 (241)
min < med < max:
0 < 0 < 12716
IQR (CV) : 6 (10.1)
214 distinct values 692
(12.2%)
28 VITAMIN D (D2 + D3)
[numeric]
Mean (sd) : 0.6 (6.3)
min < med < max:
0 < 0 < 318
IQR (CV) : 0.2 (10)
129 distinct values 690
(12.1%)
29 VITAMIN C
[numeric]
Mean (sd) : 8.2 (53.2)
min < med < max:
0 < 0.1 < 1900
IQR (CV) : 3.6 (6.5)
458 distinct values 184
(3.2%)
30 THIAMIN
[numeric]
Mean (sd) : 0.2 (0.6)
min < med < max:
0 < 0.1 < 23.4
IQR (CV) : 0.2 (2.6)
812 distinct values 280
(4.9%)
31 RIBOFLAVIN
[numeric]
Mean (sd) : 0.2 (0.5)
min < med < max:
0 < 0.1 < 17.5
IQR (CV) : 0.2 (2.1)
709 distinct values 261
(4.6%)
32 NIACIN (NICOTINIC ACID) PREFORMED
[numeric]
Mean (sd) : 3.1 (4.4)
min < med < max:
0 < 1.6 < 128
IQR (CV) : 4.4 (1.4)
2828 distinct values 234
(4.1%)
33 TOTAL NIACIN EQUIVALENT
[numeric]
Mean (sd) : 5.2 (5.6)
min < med < max:
0 < 3.5 < 132
IQR (CV) : 7.2 (1.1)
3908 distinct values 234
(4.1%)
34 PANTOTHENIC ACID
[numeric]
Mean (sd) : 0.6 (0.9)
min < med < max:
0 < 0.4 < 21.9
IQR (CV) : 0.7 (1.5)
1316 distinct values 936
(16.4%)
35 VITAMIN B-6
[numeric]
Mean (sd) : 0.2 (1)
min < med < max:
0 < 0.1 < 68.8
IQR (CV) : 0.2 (4.4)
756 distinct values 397
(7.0%)
36 TOTAL FOLACIN
[numeric]
Mean (sd) : 37.7 (93.4)
min < med < max:
0 < 12 < 3786
IQR (CV) : 35 (2.5)
290 distinct values 408
(7.2%)
37 VITAMIN B-12
[numeric]
Mean (sd) : 1.1 (6.8)
min < med < max:
0 < 0 < 380
IQR (CV) : 0.7 (6.1)
899 distinct values 354
(6.2%)
38 VITAMIN K
[numeric]
Mean (sd) : 20.8 (99.9)
min < med < max:
0 < 1.7 < 1714
IQR (CV) : 6 (4.8)
434 distinct values 2516
(44.2%)
39 FOLIC ACID
[numeric]
Mean (sd) : 8.4 (49.1)
min < med < max:
0 < 0 < 2993
IQR (CV) : 0 (5.8)
160 distinct values 160
(2.8%)
40 TRYPTOPHAN
[numeric]
Mean (sd) : 0.1 (0.1)
min < med < max:
0 < 0.1 < 1.6
IQR (CV) : 0.2 (0.9)
458 distinct values 1835
(32.2%)
41 THREONINE
[numeric]
Mean (sd) : 0.5 (0.5)
min < med < max:
0 < 0.3 < 3.7
IQR (CV) : 0.8 (0.9)
1300 distinct values 1782
(31.3%)
42 ISOLEUCINE
[numeric]
Mean (sd) : 0.6 (0.5)
min < med < max:
0 < 0.4 < 5
IQR (CV) : 0.8 (0.9)
1369 distinct values 1778
(31.2%)
43 LEUCINE
[numeric]
Mean (sd) : 1 (0.9)
min < med < max:
0 < 0.7 < 7.2
IQR (CV) : 1.4 (0.9)
1834 distinct values 1782
(31.3%)
44 LYSINE
[numeric]
Mean (sd) : 0.9 (0.9)
min < med < max:
0 < 0.4 < 5.8
IQR (CV) : 1.6 (1)
1698 distinct values 1764
(31.0%)
45 METHIONINE
[numeric]
Mean (sd) : 0.3 (0.3)
min < med < max:
0 < 0.2 < 3.2
IQR (CV) : 0.5 (1)
859 distinct values 1767
(31.1%)
46 CYSTINE
[numeric]
Mean (sd) : 0.2 (0.2)
min < med < max:
0 < 0.1 < 2.1
IQR (CV) : 0.2 (1)
495 distinct values 1842
(32.4%)
47 PHENYLALANINE
[numeric]
Mean (sd) : 0.5 (0.5)
min < med < max:
0 < 0.5 < 5.2
IQR (CV) : 0.7 (0.9)
1274 distinct values 1782
(31.3%)
48 TYROSINE
[numeric]
Mean (sd) : 0.4 (0.4)
min < med < max:
0 < 0.3 < 3.3
IQR (CV) : 0.6 (0.9)
1131 distinct values 1811
(31.8%)
49 VALINE
[numeric]
Mean (sd) : 0.6 (0.6)
min < med < max:
0 < 0.4 < 6.2
IQR (CV) : 0.9 (0.9)
1428 distinct values 1778
(31.2%)
50 ARGININE
[numeric]
Mean (sd) : 0.8 (0.8)
min < med < max:
0 < 0.5 < 7.4
IQR (CV) : 1.2 (1)
1626 distinct values 1791
(31.5%)
51 HISTIDINE
[numeric]
Mean (sd) : 0.4 (0.4)
min < med < max:
0 < 0.2 < 2.3
IQR (CV) : 0.6 (1)
1075 distinct values 1784
(31.4%)
52 ALANINE
[numeric]
Mean (sd) : 0.7 (0.7)
min < med < max:
0 < 0.4 < 8
IQR (CV) : 1 (1)
1489 distinct values 1836
(32.3%)
53 ASPARTIC ACID
[numeric]
Mean (sd) : 1.2 (1.1)
min < med < max:
0 < 0.8 < 10.2
IQR (CV) : 1.7 (0.9)
1936 distinct values 1850
(32.5%)
54 GLUTAMIC ACID
[numeric]
Mean (sd) : 2.4 (12.3)
min < med < max:
0 < 1.9 < 757
IQR (CV) : 2.9 (5.2)
2432 distinct values 1833
(32.2%)
55 GLYCINE
[numeric]
Mean (sd) : 0.6 (0.7)
min < med < max:
0 < 0.4 < 19
IQR (CV) : 1 (1.1)
1438 distinct values 1835
(32.2%)
56 PROLINE
[numeric]
Mean (sd) : 0.7 (0.6)
min < med < max:
0 < 0.6 < 12.3
IQR (CV) : 0.8 (1)
1419 distinct values 1843
(32.4%)
57 SERINE
[numeric]
Mean (sd) : 0.5 (0.5)
min < med < max:
0 < 0.5 < 6.1
IQR (CV) : 0.7 (0.9)
1288 distinct values 1844
(32.4%)
58 CHOLESTEROL
[numeric]
Mean (sd) : 41.5 (138)
min < med < max:
0 < 1 < 3100
IQR (CV) : 61 (3.3)
291 distinct values 194
(3.4%)
59 FATTY ACIDS, TRANS, TOTAL
[numeric]
Mean (sd) : 0.3 (1.7)
min < med < max:
0 < 0 < 37.6
IQR (CV) : 0.2 (5.9)
498 distinct values 3559
(62.5%)
60 FATTY ACIDS, SATURATED, TOTAL
[numeric]
Mean (sd) : 3.1 (5.8)
min < med < max:
0 < 1.1 < 95.6
IQR (CV) : 3.5 (1.9)
2812 distinct values 238
(4.2%)
61 FATTY ACIDS, SATURATED, 8:0, OCTANOIC
[numeric]
Mean (sd) : 0 (0.2)
min < med < max:
0 < 0 < 7.5
IQR (CV) : 0 (7.1)
266 distinct values 1668
(29.3%)
62 FATTY ACIDS, SATURATED, 10:0, DECANOIC
[numeric]
Mean (sd) : 0 (0.2)
min < med < max:
0 < 0 < 6
IQR (CV) : 0 (5.1)
350 distinct values 1364
(24.0%)
63 FATTY ACIDS, SATURATED, 12:0, DODECANOIC
[numeric]
Mean (sd) : 0.2 (1.7)
min < med < max:
0 < 0 < 47
IQR (CV) : 0 (8.7)
448 distinct values 1201
(21.1%)
64 FATTY ACIDS, SATURATED, 14:0, TETRADECANOIC
[numeric]
Mean (sd) : 0.2 (0.9)
min < med < max:
0 < 0 < 22.8
IQR (CV) : 0.2 (3.5)
787 distinct values 788
(13.8%)
65 FATTY ACIDS, SATURATED, 16:0, HEXADECANOIC
[numeric]
Mean (sd) : 1.7 (2.8)
min < med < max:
0 < 0.7 < 43.5
IQR (CV) : 2.1 (1.7)
2322 distinct values 602
(10.6%)
66 FATTY ACIDS, SATURATED, 18:0, OCTADECANOIC
[numeric]
Mean (sd) : 0.8 (1.7)
min < med < max:
0 < 0.3 < 33.2
IQR (CV) : 0.9 (2)
1675 distinct values 615
(10.8%)
67 FATTY ACIDS, MONOUNSATURATED, 18:1undifferentiated, OCTADECENOIC
[numeric]
Mean (sd) : 3.5 (7.2)
min < med < max:
0 < 1 < 82.6
IQR (CV) : 4 (2)
2708 distinct values 578
(10.2%)
68 FATTY ACIDS, POLYUNSATURATED, 18:2undifferentiated, LINOLEIC, OCTADECADIENOIC
[numeric]
Mean (sd) : 1.8 (4.7)
min < med < max:
0 < 0.4 < 74.6
IQR (CV) : 1.5 (2.6)
2079 distinct values 561
(9.9%)
69 FATTY ACIDS, POLYUNSATURATED, 18:3undifferentiated, LINOLENIC, OCTADECATRIENOIC
[numeric]
Mean (sd) : 0.2 (1.2)
min < med < max:
0 < 0.1 < 53.4
IQR (CV) : 0.1 (6)
689 distinct values 656
(11.5%)
70 FATTY ACIDS, POLYUNSATURATED, 20:4, EICOSATETRAENOIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 1.8
IQR (CV) : 0 (2.6)
261 distinct values 1210
(21.3%)
71 FATTY ACIDS, POLYUNSATURATED, 22:6 n-3, DOCOSAHEXAENOIC (DHA)
[numeric]
Mean (sd) : 0 (0.5)
min < med < max:
0 < 0 < 18.2
IQR (CV) : 0 (9.6)
296 distinct values 137
(2.4%)
72 FATTY ACIDS, MONOUNSATURATED, 16:1undifferentiated, HEXADECENOIC
[numeric]
Mean (sd) : 0.2 (1)
min < med < max:
0 < 0 < 18.9
IQR (CV) : 0.2 (4)
770 distinct values 836
(14.7%)
73 FATTY ACIDS, POLYUNSATURATED, 18:4, OCTADECATETRAENOIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 3
IQR (CV) : 0 (10.2)
126 distinct values 1781
(31.3%)
74 FATTY ACIDS, POLYUNSATURATED, 20:5 n-3, EICOSAPENTAENOIC (EPA)
[numeric]
Mean (sd) : 0 (0.4)
min < med < max:
0 < 0 < 13.2
IQR (CV) : 0 (9.7)
259 distinct values 1306
(23.0%)
75 FATTY ACIDS, MONOUNSATURATED, 22:1undifferentiated, DOCOSENOIC
[numeric]
Mean (sd) : 0 (0.8)
min < med < max:
0 < 0 < 41.2
IQR (CV) : 0 (17.3)
199 distinct values 1532
(26.9%)
76 FATTY ACIDS, POLYUNSATURATED, 22:5 n-3, DOCOSAPENTAENOIC (DPA)
[numeric]
Mean (sd) : 0 (0.2)
min < med < max:
0 < 0 < 5.6
IQR (CV) : 0 (11.1)
162 distinct values 149
(2.6%)
77 FATTY ACIDS, MONOUNSATURATED, TOTAL
[numeric]
Mean (sd) : 3.9 (7.8)
min < med < max:
0 < 1.2 < 83.7
IQR (CV) : 4.5 (2)
2880 distinct values 314
(5.5%)
78 FATTY ACIDS, POLYUNSATURATED, TOTAL
[numeric]
Mean (sd) : 2.2 (5.2)
min < med < max:
0 < 0.6 < 74.6
IQR (CV) : 1.8 (2.4)
2381 distinct values 316
(5.6%)
79 NATURALLY OCCURRING FOLATE
[numeric]
Mean (sd) : 29.2 (75.3)
min < med < max:
0 < 9 < 2340
IQR (CV) : 20 (2.6)
261 distinct values 503
(8.8%)
80 RETINOL ACTIVITY EQUIVALENTS
[numeric]
Mean (sd) : 115 (836)
min < med < max:
0 < 3 < 30000
IQR (CV) : 33 (7.3)
463 distinct values 260
(4.6%)
81 DIETARY FOLATE EQUIVALENTS
[numeric]
Mean (sd) : 44.4 (119)
min < med < max:
0 < 12 < 5881
IQR (CV) : 41 (2.7)
332 distinct values 499
(8.8%)
82 FATTY ACIDS, POLYUNSATURATED, 18:2 c,c n-6, LINOLEIC, OCTADECADIENOIC
[numeric]
Mean (sd) : 2.3 (6)
min < med < max:
0 < 0.5 < 74.6
IQR (CV) : 1.5 (2.6)
1285 distinct values 3256
(57.2%)
83 FATTY ACIDS, POLYUNSATURATED, 20:3, EICOSATRIENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 1.4
IQR (CV) : 0 (7.1)
89 distinct values 2322
(40.8%)
84 FATTY ACIDS, POLYUNSATURATED, 18:3 c,c,c n-3 LINOLENIC, OCTADECATRIENOIC
[numeric]
Mean (sd) : 0.2 (1.3)
min < med < max:
0 < 0 < 53.4
IQR (CV) : 0.1 (6.2)
619 distinct values 954
(16.8%)
85 FATTY ACIDS, POLYUNSATURATED, 18:3 c,c,c n-6, g-LINOLENIC, OCTADECATRIENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 1
IQR (CV) : 0 (18.1)
49 distinct values 307
(5.4%)
86 BETA CRYPTOXANTHIN
[numeric]
Mean (sd) : 15.2 (198)
min < med < max:
0 < 0 < 6252
IQR (CV) : 0 (13)
128 distinct values 2334
(41.0%)
87 LYCOPENE
[numeric]
Mean (sd) : 220 (1807)
min < med < max:
0 < 0 < 46260
IQR (CV) : 0 (8.2)
190 distinct values 2324
(40.8%)
88 LUTEIN AND ZEAXANTHIN
[numeric]
Mean (sd) : 260 (1387)
min < med < max:
0 < 0 < 19697
IQR (CV) : 39 (5.3)
419 distinct values 2346
(41.2%)
89 FATTY ACIDS, POLYUNSATURATED, 20:3 n-6, EICOSATRIENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 1.4
IQR (CV) : 0 (14.6)
71 distinct values 521
(9.2%)
90 FATTY ACIDS, POLYUNSATURATED, 20:4 n-6, ARACHIDONIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 1.8
IQR (CV) : 0 (2.7)
227 distinct values 2625
(46.1%)
91 FATTY ACIDS, POLYUNSATURATED, 20:3 n-3 EICOSATRIENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 1
IQR (CV) : 0 (16.7)
52 distinct values 505
(8.9%)
92 VITAMIN B12, ADDED
[numeric]
Mean (sd) : 1 (17.5)
min < med < max:
0 < 0 < 380
IQR (CV) : 0 (18.1)
28 distinct values 5218
(91.7%)
93 ALPHA-TOCOPHEROL, ADDED
[numeric]
Mean (sd) : 0.1 (0.9)
min < med < max:
0 < 0 < 16.9
IQR (CV) : 0 (12)
11 distinct values 5231
(91.9%)
94 VITAMIN D2, ERGOCALCIFEROL
[numeric]
Mean (sd) : 0.3 (2)
min < med < max:
0 < 0 < 28.1
IQR (CV) : 0 (6.3)
22 distinct values 5344
(93.9%)
95 FATTY ACIDS, SATURATED, 4:0, BUTANOIC
[numeric]
Mean (sd) : 0 (0.2)
min < med < max:
0 < 0 < 3.2
IQR (CV) : 0 (5)
274 distinct values 1839
(32.3%)
96 FATTY ACIDS, SATURATED, 6:0, HEXANOIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 2
IQR (CV) : 0 (4.9)
224 distinct values 1816
(31.9%)
97 ALPHA CAROTENE
[numeric]
Mean (sd) : 40.8 (387)
min < med < max:
0 < 0 < 14251
IQR (CV) : 0 (9.5)
164 distinct values 2340
(41.1%)
98 FATTY ACIDS, MONOUNSATURATED, 22:1c, DOCOSENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 1.1
IQR (CV) : 0 (5.7)
100 distinct values 2912
(51.2%)
99 FATTY ACIDS, POLYUNSATURATED, 18:3i, LINOLENIC, OCTADECATRIENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0.3
IQR (CV) : 0 (4.7)
54 distinct values 4419
(77.7%)
100 FATTY ACIDS, MONOUNSATURATED, 22:1t, DOCOSENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0.1
IQR (CV) : 0 (18.5)
16 distinct values 2983
(52.4%)
101 SUCROSE
[numeric]
Mean (sd) : 2 (7.3)
min < med < max:
0 < 0 < 99.8
IQR (CV) : 0.4 (3.7)
487 distinct values 3044
(53.5%)
102 GLUCOSE
[numeric]
Mean (sd) : 0.8 (2.5)
min < med < max:
0 < 0 < 35.8
IQR (CV) : 0.5 (3.2)
399 distinct values 3051
(53.6%)
103 FRUCTOSE
[numeric]
Mean (sd) : 0.7 (2.5)
min < med < max:
0 < 0 < 55.6
IQR (CV) : 0.3 (3.6)
387 distinct values 3055
(53.7%)
104 LACTOSE
[numeric]
Mean (sd) : 0.3 (1.2)
min < med < max:
0 < 0 < 13.2
IQR (CV) : 0 (4)
225 distinct values 3076
(54.1%)
105 MALTOSE
[numeric]
Mean (sd) : 0.2 (0.8)
min < med < max:
0 < 0 < 16.4
IQR (CV) : 0 (3.9)
217 distinct values 3098
(54.4%)
106 GALACTOSE
[numeric]
Mean (sd) : 0 (0.5)
min < med < max:
0 < 0 < 19.9
IQR (CV) : 0 (14.1)
53 distinct values 3122
(54.9%)
107 FATTY ACIDS, SATURATED, 20:0, EICOSANOIC
[numeric]
Mean (sd) : 0 (0.2)
min < med < max:
0 < 0 < 4.6
IQR (CV) : 0 (4.6)
183 distinct values 3649
(64.1%)
108 FATTY ACIDS, SATURATED, 22:0, DOCOSANOIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 3.7
IQR (CV) : 0 (5.8)
133 distinct values 3691
(64.9%)
109 FATTY ACIDS, MONOUNSATURATED, 14:1, TETRADECENOIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 1.8
IQR (CV) : 0 (4)
156 distinct values 3666
(64.4%)
110 FATTY ACIDS, MONOUNSATURATED, 20:1, EICOSENOIC
[numeric]
Mean (sd) : 0.1 (0.6)
min < med < max:
0 < 0 < 15
IQR (CV) : 0 (6.3)
365 distinct values 1759
(30.9%)
111 FATTY ACIDS, SATURATED, 15:0, PENTADECANOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0.9
IQR (CV) : 0 (2.9)
121 distinct values 3772
(66.3%)
112 FATTY ACIDS, SATURATED, 17:0, HEPTADECANOIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 0.8
IQR (CV) : 0 (2)
189 distinct values 3723
(65.4%)
113 FATTY ACIDS, SATURATED, 24:0, TETRACOSANOIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 4.7
IQR (CV) : 0 (9.4)
91 distinct values 3949
(69.4%)
114 STARCH
[numeric]
Mean (sd) : 4 (11.4)
min < med < max:
0 < 0 < 73.3
IQR (CV) : 0 (2.9)
360 distinct values 3755
(66.0%)
115 BETA-TOCOPHEROL
[numeric]
Mean (sd) : 0.1 (0.5)
min < med < max:
0 < 0 < 10.5
IQR (CV) : 0.1 (5.1)
65 distinct values 4929
(86.6%)
116 GAMMA-TOCOPHEROL
[numeric]
Mean (sd) : 2.3 (5.7)
min < med < max:
0 < 0.2 < 65.2
IQR (CV) : 1.7 (2.5)
274 distinct values 4922
(86.5%)
117 DELTA-TOCOPHEROL
[numeric]
Mean (sd) : 0.4 (1.3)
min < med < max:
0 < 0 < 15.4
IQR (CV) : 0.2 (3.2)
148 distinct values 4928
(86.6%)
118 FATTY ACIDS, MONOUNSATURATED, 16:1t, HEXADECENOIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 6.1
IQR (CV) : 0 (19.6)
73 distinct values 3959
(69.6%)
119 FATTY ACIDS, MONOUNSATURATED, 18:1t, OCTADECENOIC
[numeric]
Mean (sd) : 0.1 (0.7)
min < med < max:
0 < 0 < 20.2
IQR (CV) : 0.1 (5.6)
295 distinct values 4118
(72.4%)
120 FATTY ACIDS, POLYUNSATURATED, 18:2i, LINOLEIC, OCTADECADIENOIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 2.3
IQR (CV) : 0 (3.6)
140 distinct values 4332
(76.1%)
121 FATTY ACIDS, MONOUNSATURATED, 24:1c, TETRACOSENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0.6
IQR (CV) : 0 (7.9)
45 distinct values 4148
(72.9%)
122 FATTY ACIDS, MONOUNSATURATED, 16:1c, HEXADECENOIC
[numeric]
Mean (sd) : 0.1 (0.3)
min < med < max:
0 < 0 < 6.9
IQR (CV) : 0.1 (2.4)
396 distinct values 3923
(68.9%)
123 FATTY ACIDS, POLYUNSATURATED, 20:2 c,c EICOSADIENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0.7
IQR (CV) : 0 (3.1)
128 distinct values 3854
(67.7%)
124 FATTY ACIDS, MONOUNSATURATED, 18:1c, OCTADECENOIC
[numeric]
Mean (sd) : 4.7 (72.2)
min < med < max:
0 < 1.1 < 2845
IQR (CV) : 3.2 (15.5)
1066 distinct values 4132
(72.6%)
125 FATTY ACIDS, MONOUNSATURATED, 17:1, HEPTADECENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 1.1
IQR (CV) : 0 (2.7)
135 distinct values 3903
(68.6%)
126 FATTY ACIDS, TOTAL TRANS-MONOENOIC
[numeric]
Mean (sd) : 0.1 (0.7)
min < med < max:
0 < 0 < 20.2
IQR (CV) : 0.1 (6.2)
285 distinct values 4249
(74.7%)
127 FATTY ACIDS, MONOUNSATURATED, 15:1, PENTADECENOIC
[numeric]
Mean (sd) : 0 (0.2)
min < med < max:
0 < 0 < 6
IQR (CV) : 0 (28)
27 distinct values 4050
(71.2%)
128 FATTY ACIDS, POLYUNSATURATED, CONJUGATED, 18:2 cla, LINOLEIC, OCTADECADIENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 1.1
IQR (CV) : 0 (4.4)
90 distinct values 4331
(76.1%)
129 FATTY ACIDS, POLYUNSATURATED, 22:4 n-6, DOCOSATETRAENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0.3
IQR (CV) : 0 (3.1)
66 distinct values 4229
(74.3%)
130 FATTY ACIDS, TOTAL TRANS-POLYENOIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 2.5
IQR (CV) : 0 (3.8)
154 distinct values 4330
(76.1%)
131 CHOLINE, TOTAL
[numeric]
Mean (sd) : 39.1 (70.6)
min < med < max:
0 < 19 < 2403
IQR (CV) : 50.5 (1.8)
910 distinct values 2813
(49.4%)
132 BETAINE
[numeric]
Mean (sd) : 10.6 (31.5)
min < med < max:
0 < 3.9 < 630
IQR (CV) : 10.1 (3)
257 distinct values 4589
(80.7%)
133 FATTY ACIDS, POLYUNSATURATED, TOTAL OMEGA N-3
[numeric]
Mean (sd) : 0.5 (2.4)
min < med < max:
0 < 0.1 < 53.4
IQR (CV) : 0.2 (4.9)
548 distinct values 3717
(65.3%)
134 FATTY ACIDS, POLYUNSATURATED, TOTAL OMEGA N-6
[numeric]
Mean (sd) : 3.1 (23.3)
min < med < max:
0 < 0.5 < 953
IQR (CV) : 1.4 (7.6)
1055 distinct values 3711
(65.2%)
135 ASPARTAME
[numeric]
Mean (sd) : 51.1 (403)
min < med < max:
0 < 0 < 3722
IQR (CV) : 0 (7.9)
0 : 82 (94.3%)
37 : 1 ( 1.1%)
42 : 1 ( 1.1%)
52 : 1 ( 1.1%)
597 : 1 ( 1.1%)
3722 : 1 ( 1.1%)
5603
(98.5%)
136 TOTAL PLANT STEROL
[numeric]
Mean (sd) : 26.4 (80.7)
min < med < max:
0 < 0 < 1190
IQR (CV) : 14 (3.1)
117 distinct values 4995
(87.8%)
137 MANNITOL
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0.2
IQR (CV) : 0 (17.6)
3 distinct values 4313
(75.8%)
138 SORBITOL
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 2.3
IQR (CV) : 0 (14)
9 distinct values 4304
(75.6%)
139 STIGMASTEROL
[numeric]
Mean (sd) : 1.3 (5.9)
min < med < max:
0 < 0 < 59
IQR (CV) : 0 (4.6)
26 distinct values 5183
(91.1%)
140 TOTAL MONOSACCARIDES
[numeric]
Mean (sd) : 0.8 (2.7)
min < med < max:
0 < 0 < 30.6
IQR (CV) : 0.1 (3.3)
267 distinct values 3810
(67.0%)
141 TOTAL DISACCHARIDES
[numeric]
Mean (sd) : 1.5 (4.8)
min < med < max:
0 < 0 < 47.2
IQR (CV) : 0 (3.3)
295 distinct values 3824
(67.2%)
142 BETA-SITOSTEROL
[numeric]
Mean (sd) : 14 (47.1)
min < med < max:
0 < 0 < 621
IQR (CV) : 0 (3.4)
54 distinct values 5187
(91.2%)
143 HYDROXYPROLINE
[numeric]
Mean (sd) : 0.1 (0.1)
min < med < max:
0 < 0 < 0.7
IQR (CV) : 0.2 (1.3)
197 distinct values 5083
(89.3%)
144 FATTY ACIDS, SATURATED, 13:0 TRIDECANOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0.1
IQR (CV) : 0 (10.7)
10 distinct values 5241
(92.1%)
145 FATTY ACIDS, POLYUNSATURATED, 21:5
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0.4
IQR (CV) : 0 (11.5)
12 distinct values 4734
(83.2%)
146 FATTY ACIDS, MONOUNSATURATED, 24:1undifferentiated, TETRACOSENOIC
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 3
IQR (CV) : 0 (19.8)
32 distinct values 4550
(80.0%)
147 FATTY ACIDS, MONOUNSATURATED, 12:1, LAUROLEIC
[numeric]
1 distinct value 0 : 351 (100.0%) 5339
(93.8%)
148 FATTY ACIDS, POLYUNSATURATED, 22:3,
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0.1
IQR (CV) : 0 (13.6)
10 distinct values 4754
(83.6%)
149 FATTY ACIDS, POLYUNSATURATED, 22:2, DOCOSADIENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0
IQR (CV) : 0 (15.5)
4 distinct values 4690
(82.4%)
150 FATTY ACIDS, POLYUNSATURATED, 18:2t,t , OCTADECADIENENOIC
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 0.5
IQR (CV) : 0 (5.9)
59 distinct values 4622
(81.2%)
151 CAMPESTEROL
[numeric]
Mean (sd) : 3.8 (16)
min < med < max:
0 < 0 < 189
IQR (CV) : 0 (4.2)
27 distinct values 5400
(94.9%)
152 BIOTIN
[numeric]
Mean (sd) : 6.1 (6.5)
min < med < max:
0 < 3.5 < 31.6
IQR (CV) : 7.2 (1.1)
71 distinct values 5585
(98.2%)
153 NA
[numeric]
1 distinct value 1 distinct values 5689
(100.0%)
154 OXALIC ACID
[numeric]
Mean (sd) : 0.3 (0.4)
min < med < max:
0 < 0.1 < 1.7
IQR (CV) : 0.3 (1.4)
27 distinct values 5639
(99.1%)

Missing values in clean dataset

Citations for packages used

Analyses were conducted using the R Statistical language (version 3.6.0; R Core Team, 2019) on macOS Mojave 10.14.6, using the packages GGally (version 2.0.0; Barret Schloerke et al., 2020), gtsummary (version 1.3.6; Daniel Sjoberg et al., 2021), summarytools (version 0.9.8; Dominic Comtois, 2020), Matrix (version 1.2.17; Douglas Bates and Martin Maechler, 2019), RColorBrewer (version 1.1.2; Erich Neuwirth, 2014), ggplot2 (version 3.3.3; Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.), tidyverse (version 1.2.1; Hadley Wickham, 2017), stringr (version 1.4.0; Hadley Wickham, 2019), tidyr (version 1.1.2; Hadley Wickham, 2020), forcats (version 0.5.1; Hadley Wickham, 2021), readr (version 1.3.1; Hadley Wickham, Jim Hester and Romain Francois, 2018), dplyr (version 1.0.2; Hadley Wickham et al., 2020), stargazer (version 5.2.2; Hlavac, Marek, 2018), wordcloud (version 2.6; Ian Fellows, 2018), tm (version 0.7.8; Ingo Feinerer and Kurt Hornik, 2020), glmnet (version 4.1.1; Jerome Friedman et al., 2010), car (version 3.0.3; John Fox and Sanford Weisberg, 2019), carData (version 3.0.2; John Fox, Sanford Weisberg and Brad Price, 2018), here (version 1.0.1; Kirill Müller, 2020), tibble (version 3.1.0; Kirill Müller and Hadley Wickham, 2021), NLP (version 0.2.1; Kurt Hornik, 2020), purrr (version 0.3.4; Lionel Henry and Hadley Wickham, 2020), sjPlot (version 2.8.6; Lüdecke D, 2020), report (version 0.3.0; Makowski et al., 2020), data.table (version 1.12.2; Matt Dowle and Arun Srinivasan, 2019), varhandle (version 2.0.5; Mehrad Mahmoudian, 2020), SnowballC (version 0.7.0; Milan Bouchet-Valat, 2020), imputeTS (version 3.2; Moritz S, Bartz-Beielstein T, 2017), pacman (version 0.5.1; Rinker et al., 2017), corrplot (version 0.84; Taiyun Wei and Viliam Simko, 2017) and pROC (version 1.17.0.1; Xavier Robin et al., 2011).

References

Another way to handle NAs

  • remove variables with more than 50% of NAs + apply mean imputation
  • applied the same analytic approach to compare the results with our main findings. would removing variables with a lot of NAs change the results?
  1. Data Preparation

2.1.1 Regular K-means Clustering

Run k-means clustering

## [1] "Size of each cluster is" "9"                      
## [3] "21"                      "9"                      
## [5] "23"                      "97"                     
## [7] "1785"                    "3687"                   
## [9] "59"
Plotting : two random variables

###### Plotting : wordcloud of each cluster

###### Mean calories per cluster

Most important nutrients (PCA)

2.1.2 Spectrum Clustering

First, Run PCA

##### Most important nutrients used for spectrum clustering (PC1, PC2) ###### Run k-means clustering

Plotting : two PCs (Note: all exploratory plots)

Plotting : wordcloud

###### Mean calories per cluster

Most important nutrients (PCA)

References